Revised classification of kinases based on bioactivity data: the importance of data density and choice of visualization
نویسندگان
چکیده
Kinases are a major class of drug targets and are involved in a variety of diseases such as diabetes, cancer and inflammation. Still, understanding kinase inhibitor selectivity and promiscuity remains a major challenge. In order to improve upon the current situation, we analyzed a dataset comprising 157 compounds, tested at concentrations of 1 μM and 10 μM against a panel of 225 human protein kinases. Our bioactivity-based classification of kinases shows similarities with the Sugen sequence-based classification [1], where particularly kinases from the TK, CDK, CLK and AGC groups cluster together. However, 57% of all kinase pairs inhibited by 6 known inhibitors consist of kinases which lie far apart from each other in the Sugen tree (relative distance of 0.6 0.8 on a scale from 0 to 1), but are correctly located closer to each other in our bioactivity-based tree (distance 0 0.4). For 80% of all analyzed kinases, those classified as neighbors according to the bioactivity-based classification also show high similarity in shared active compounds. However, among the remaining ~20%, distant kinases did not necessarily show low SAR similarity, and neighboring kinases did not necessarily show high SAR similarity; i.e., the placement in the tree was misleading. We identified two reasons for this: firstly, ‘misplaced’ kinases exhibit inconsistent SAR, and secondly, these kinases had only a few shared activities with other kinases, making both the computation of their bioactivity-based distance and their place in the tree less accurate. In a follow-up analysis, we resolved both problems by visualizing inconsistent SAR more accurately using MDS plots, rather than phylogenetic trees, and by excluding kinases with 16 or fewer shared activities. Only 7 kinases (4% of the kinases analyzed) did not show a clear relationship between kinase bioactivity profile similarity and shared active compounds. Hence, this analysis improves on previous studies, where the influence of data density on kinase similarity was not considered, and leads to a more reliable placement of kinases into the kinome tree. Overall, our analysis suggests that bioactivity-based classification of kinases is indeed more useful than sequence-based classification for predicting kinase-inhibitor interactions. However, care needs to be taken with respect to data density (i.e., kinases with too few data points need to be omitted) and visualization of the data (i.e., phylogenetic trees imply a neighborhood relationship that is not consistently observed in every case).
منابع مشابه
A new approach for data visualization problem
Data visualization is the process of transforming data, information, and knowledge into visual form, making use of humans’ natural visual capabilities which reveals relationships in data sets that are not evident from the raw data, by using mathematical techniques to reduce the number of dimensions in the data set while preserving the relevant inherent properties. In this paper, we formulated d...
متن کاملConvex Surface Visualization Using Rational Bi- cubic Function
The rational cubic function with three parameters has been extended to rational bi-cubic function to visualize the shape of regular convex surface data. The rational bi-cubic function involves six parameters in each rectangular patch. Data dependent constraints are derived on four of these parameters to visualize the shape of convex surface data while other two are free to refine the shape of s...
متن کاملClassification of LEED World Standard Indicators in Sustainable Architecture of Contemporary Iranian Cities Based on Regional Ecological Characteristics: A Case Study of Qom City
Following developments in urban contexts of contemporary Iran, the creation of new urban contexts and cities has a direct impact on the ecosystem of its surrounding area. Urban neighborhoods on the path to achieving the goal of sustainable urban development require the integration of architecture and the principles of environmental sustainability. Using the Sustainable Architecture approach bas...
متن کاملOn Calibration and Application of Logit-Based Stochastic Traffic Assignment Models
There is a growing recognition that discrete choice models are capable of providing a more realistic picture of route choice behavior. In particular, influential factors other than travel time that are found to affect the choice of route trigger the application of random utility models in the route choice literature. This paper focuses on path-based, logit-type stochastic route choice models, i...
متن کاملUSING DISTRIBUTION OF DATA TO ENHANCE PERFORMANCE OF FUZZY CLASSIFICATION SYSTEMS
This paper considers the automatic design of fuzzy rule-basedclassification systems based on labeled data. The classification performance andinterpretability are of major importance in these systems. In this paper, weutilize the distribution of training patterns in decision subspace of each fuzzyrule to improve its initially assigned certainty grade (i.e. rule weight). Ourapproach uses a punish...
متن کامل